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Abstract 

Classical confidence limits are compared to Bayesian error bounds by study- 
^ \ ing relevant examples. The performance of the two methods is investigated 

O ■ relative to the properties coherence, precision, bias, universality, simplicity. A 

proposal to define error limits in various cases is derived from the compar- 
ison. It is based on the likelihood function only and follows in most cases 
CL|' the general practice in high energy physics. Classical methods are discarded 

. because they violate the likelihood principle, they can produce physically in- 

QQ \ consistent results, suffer from a lack of precision and generality. Also the ex- 

treme Bayesian approach with arbitrary choice of the prior probability density 
or priors deduced from scaling laws is rejected. 



o 
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1. Purpose, criteria, definitions 

The progress of experimental sciences to a large extent is due to their practice to assign uncertainties to 



o 

O ■ 

Q . results. The information contained in a measurement, or a parameter deduced from it, is incompletely 

O \ documented and more or less useless unless some kind of error is attributed to the data. The precision of 

^ ■ measurements has to be known i) to combine data from different experiments, ii) to deduce secondary 

parameters from it and iii) to test predictions of theories. Different statistical methods have to be judged 

Qh! on their ability to fulfill these tasks. 

CD , ^ 

(— I , Narsky ||1|] who compares several different approaches to the estimation of upper Poisson limits, 

states: "There is no such thing as the best procedure for upper limit estimation. An experimentalist is 
free to choose any procedure she/he likes, based on her/his belief and experience. The only requirement 
is that the chosen procedure must have a strict mathematical foundation." This opinion is typical for 
d I many papers on confidence limits. However, "the real test of the pudding is in its eating" and not in 

contemplating the beauty of the cooking recipe. We should not forget that what we measure has practical 
implications. 

In this paper, the emphasis is put on performance and not on the mathematical and statistical foun- 
dation. The intention is to confront the procedures with the problems to be solved in physics. Simple 
transparent examples are selected. Important properties are among others consistency, precision, univer- 
sality, simplicity and objectivity. 

Consistency is indispensable in any case. A. W. F. Edwards writes |^: "Relative support (of a 
hypothesis or a parameter) must be consistent in different applications, so that we are content to react 
equally to equal values, and it must not be affected by information judged intuitively to be irrelevant." 
Part of the content of this article has been presented in a comment [|3|] to the unified approach [Q]. 

1.1 Classical confidence limits 

Classical confidence limits (CCL) are based on tail probabilities. The defining property is coverage: If a 
large number of experiments perform measurements of a parameter with confidence level q, the fraction 
a of the limits will contain the true value of the parameter inside the confidence limits. 
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Fig. 1: Two parameter classical confidence limit for a measurement xiX2- The dashed contours labeled with small 
letters in the sample space correspond to probability contours of the parameter pairs labeled with capital letters in 
the parameter space. 



We illustrate the concept of CCL for a measurement (statistic) consisting of a two-dimensional 
observation (xi,X2) and a two dimensional parameter space (see Fig. 1). In a first step we associate 
to each point Oi , 02 in the parameter space a closed probability contour in the sample space containing 
a measurement with probability a. For example, the probability contour labeled a in the sample space 
corresponds to the parameter values of point A in the parameter space. The curve {confidence con- 
tour) connecting all points in the parameter space with probability contours in the sample space passing 
through the actual measurement xi,X2 encloses the confidence region of confidence level a. 

Figure 1 demonstrates some of the requirements necessary for the construction of an exact con- 
fidence region: 1. The sample space must be continuous. (Discrete distributions and thus all digital 
measurements and in principle also Poisson processes are excluded.) 2. The probability contours should 
enclose a simply connected region. 3. The parameter space has to be continuous. 

The restriction (1) usually is overcome by relaxing the requirement of exact coverage and by 
requiring minimum overcoverage. This is not an elegant solution. 

There is considerable freedom in the choice of the probabiUty contours but to insure coverage 
they have to be defined independently of the result of the experiment. Usually, contours are locations of 
constant probability density. In one dimension also central intervals and intervals leading to minimum 
sized confidence intervals are popular. Clearly, there is a lack of standardization. The unified approach 
[0] defines the probability regions through the likelihood ratio. 

1.2 Likelihood limits and Bayesian conventions 

Likelihood intervals enclose a region where the likelihood function decreases by a fixed ratio, equal to 
^/e for one standard deviation and for two standard deviations etc.. 

Bayesians integrate the normalized likelihood function and form either probability regions or mo- 
ments to define the limits. I will discuss only uniform prior densities. This does not restrict the freedom 
of the scientist because there is the equivalent possibility to choose the parameter. For example an anal- 
ysis using the mean life parameter with the prior l/r^ is equivalent to an analysis of the decay constant 
7 with uniform prior. 
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Fig. 2: Likelihood limits (left) and Bayesian limits (right). 



1.3 The likelihood principle 

Assume we have two hypothesis characterized by the parameters 6i and 62. For a measurement xi the 
relative support to the two hypothesis is given by the likelihood ratio 



Another measurement X2 is equivalent to xi if the likelihood ratios are the same: 

_ P(X2|^1) 
P(xi|02) P{X2\02) 

When we have more than two hypothesis we require that equivalent date provide the same like- 
lihood ratio for all combinations of parameters. Consequently, for a pdf depending on a continuous 
parameter 9, we have to require that the likelihood functions for the two measurements are proportional 
to each other. These considerations correspond to the Likelihood Principle (LP): The likelihood function 
contains the full information relative to the parameter. Inference should be based on the likelihood func- 
tion only. The LP is due to Fisher, Birnbaum and others. Proofs and discussions can be found in Refs. 

Methods that provide different results for measurement that have proportional Ukelihood functions 
are inconsistent. 



2. Examples 

2.1 Example la: Gaussian with physical boundary 

A physical quantity like the mass of a particle with a resolution following normal distributions is con- 
strained to positive values. Fig. 3 shows typical central confidence bounds which extend into the unphys- 
ical region. In extreme cases a measurement may produce a 90 % confidence interval which does not 
cover positive values at all. The unified approach and the Bayesian method avoid unphysical confidence 
limits. 
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Fig. 3: Gaussian errors near physical boundary (c: classical, cu: unified, 1: likelihood, b: Bayesian). Left: 68.3% 
errors, right:90% upper hmits. 




Fig. 4: Disconected probability intervals in the unified approach. Gaussians (top) and Breit-Wigner (bottom). 



2.2 Example lb: Superposition of Gaussians in the unified approach 

The prescription for the construction of the probability intervals according to the likelihood ratio ordering 
leads to disconnected interval regions when the pdf has tails and cannot produce confidence intervals. 
This is shown in Fig. 4 top. 

2.3 Example Ic: Breit-Wigner distribution 

The same difficulty arises for the Breit-Wigner distribution (see Fig. 4 bottom). 

The problem is absent if the pdf / fulfills the condition (Phi f / dx^ > 0. This condition restricts 
the application of the unified approach to pdfs similar to Gaussians. 

2.4 Example 2: Gaussian in two dimension and physical boundary 

Let us assume that we have a Gaussian resolution in x, y and a physical boundary in y (Fig. 5). The 
probability contours are deformed in the unified approach as indicated in the sketch. As a consequence 
the error in x shrinks due to a boundary in y even though the two parameters are independent. One 
has to be careful in the interpretation of two-dimensional confidence limits as they occur for example in 
neutrino oscillation experiments. 

2.5 Example 3: Slope of a linear distribution 

This is a frequent distribution in particle physics. A linear distribution is always restricted in the sample 
and the parameter space to avoid negative probabilities. We choose 

f(^x\e) = ]^{i + ex)] -i<e,x<i 
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Fig. 5: Probabilty contours (schematic) for a two-dimensional Gaussian near a boundary in the unified approach. 
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Fig. 6: Likehhood for a slope parameter. 



as is realized in many asymmetry distributions. For a sample of 100 events following the distiibu- 
tion of Equ. 2.4, a likelihood analysis gives a best value for the slope parameter of ^ = 0.92 (see Figure 
6). There is no simple statistic allowing to compute central classical 62.8% confidence limits because 
the parameter is undefined outside the interval [1,1]. Contrary to the conventional classical approach, the 
unified approach is able to handle the problem by working in the full sample space (hundred dimensional 
in our case) This requires a considerable computing efforlQ. 

Likelihood limits are possible - the upper limit would coincide with the boundary - but not well 
suited to measure the precision. 



'in my presentation at the meeting I had not realized this solution in the unified approach. I thank Fred James and Gary 
Feldman for explaining it to me. 
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Fig. 7: Two-sided physical boundary. Classical error bounds cover the full physical region. 



2.6 Example 4: Digital measurements 

A particle track is passing at the unknown position fi through a proportional wire chamber. The measured 
coordinate x is set equal to the wire location x^. The probability density for a measurement x 



is independent of the true location fi. Thus it is impossible to define a sensible classical confidence or 
likelihood interval, except a trivial one with full overcoverage. This difficulty is common to all digital 
measurements because they violate condition 1 of section 2.1. Thus a large class of measurements is 
not handled in classical statistics. A Bayesian treatment with uniform prior is the common solution. It 



2.7 Example 5: Gaussian with two physical boundaries 

A particle passes through a small scintillator and another position sensitive detector with Gaussian res- 
olution. Both boundaries of the classical error interval are in the region forbidden by the scintillator 
signal, (see Fig. 7) The classical error is twice as large as the r.m.s. width. It is meaningless. The unified 
classical and the likelihood limits contain the full physical region and thus are useless. Again only the 
Bayesian method gives reasonable results. 

2.8 Example 6: Gaussian with variable width 

A theory, depending on the unknown parameter 9 predicts the Gaussian probability density 



for the time t of an earthquake. The classical confidence interval for a measurement at t = 10 h is 
7.66 < < oo. It is shown together with the likelihood function in Fig. 8. When we look at the two 
distinct parameter values, predicting the time of an earthquake 



= 6{X - Xuj) 



provides the r.m.s. error pitch/ ^/T2. 




Hi: ti = (7.50 ±2.25) h 
H2: t2 = (50 ± 100) h 



we realize that the first is excluded by the classical bounds, the second by the likelihood limits. The Fig. 
8b shows the two probability densities together with the measurement. Clearly, we would rather accept 




Fig. 8: Predictions from two discrete hypothesis Hi, H2 and measurement (a) and log-likeUhood for parametrization 
of the two hypothesis (b). The likeUhood ratio strongly favors Hiwhich is excluded by the classical confidence 
limits. 
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Fig. 9: A sequential stopping rule does not introduce a bias. 



Hi. This choice is also supported by the likelihood ratio which is in favor of Hi by a factor 26. Thus the 
likelihood limits are intuitively more acceptable than the classical ones. 

The preceding example shows that the concept of classical confidence limits for continuous pa- 
rameters is not compatible with methods based on the likelihood values. We may construct a transition 
from the discrete case to the continuous one by adding more and more hypothesis but a transition from 
likelihood based methods to CCL is impossible. The two classical approaches CCL and Neyman-Pearson 
test lack a common bases. 

2.9 Example 6b: Number of neutrinos 

This example was presented by Cousins [^: Markll had measured the number of neutrinos to be 2. 8 ±0.6 
and deduced a 95% confidence upper limit of 3.9 excluding 4 neutrino generations. The likelihood ratio 
of 7.0 produces a much weaker exclusion of the discrete hypothesis. 

2.10 Example 7: Stopping rule 

A rate measurement may be stopped for reasons like: i) There are enough events, ii) For a long time no 
event has been observed, iii) A "golden" event was recorded. 

These actions do not introduce a bias as has been first realized by Barnard and co-workers [^. 
The reason is that the likelihood function is independent of the stopping rule. This may be visualized by 
an infinitely long measurement which is cut in pieces each corresponding to a experiment stopped by the 
same rule. The individual experiment cannot be biased since the full chain is unbiased. This is illustrated 
in Fig. 9 where the experiments are stopped whenever 3 events are recorded in a short time interval. 

The Figure 10 shows the likelihood function for an experiment where 4 events are observed in a 
time interval of one second. The classical results depend on the stopping condition: a) the time interval 
had been fixed, b) the experiment was stopped after the forth event. The likelihood principle states that 
the two data sets are equivalent. Thus the classical limits are inconsistent. 

The differences become even larger when we take the example of 1 event recorded in 1 second (see 
Fig. 10 right). The likelihood functions given by the lifetime distribution and the Poisson distribution, 
respectively are proportional to each other 

fit\X) =Ae-^* 
P(1|A) =^ 




Fig. 10: Stopping after a fixed time or when a fixed number of events has been observed (same HkeHhood) gives 
different results in classical statistics. 
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Table 1: Poisson limits in classical and Bayesian approaches 



2.11 Example 8: Poisson signal with background 

In a garden there are apple and pear trees. Usually during night some pears fall from the trees. One 
morning looking from his window, the proprietor who is interested in apples find that no fruit is lying 
in the grass. Since it is still quite dark he is unable to distinguish apples from pears. He concludes that 
the average rate of falling apples per night is less the 2.3 with 90% confidence level. His wife who is 
a classical statistician tells him that his rate limit is too high because he has forgotten to subtract the 
expected pears background. He argues, "there are no pears", but she insists and explains him that if he 
ignores the pears that could have been there but weren't, he would violate the coverage requirement. In 
the meantime it has become bright outside and pears and apples - which both are not there - are now 
distinguishable. Even though the evidence has not changed, the classical limit has. 

The 90% confidence limits for zero events observed and background expectation 6 = is /i = 
2.3. For 6 = 2 it is /x' = 0.3 much lower. CCL are different for two experiments with exactly the 
same experimental evidence relative to the signal (no signal event seen). This situation is absolutely 
intolerable. Feldman and Cousins consider this kind of objections as "based on a misplaced Bayesian 
interpretation of classical intervals" [Q]. It is hard to detect a Bayesian origin in a generally accepted 
principle in science, namely, two measurements containing the same information should give identical 
results. The critics here is not that CCLs are inherently wrong but that their application to the computation 
of upper limits when background is expected does not make sense, i.e. these limits do not measure the 
precision of the experiment. 

The effect is less dramatic but also present in the unified approach: An experiment finding no 
event n=0 with background expectation b=3 produces a 90% confidence limit 1.08 for the signal (see 
Table 2.1). Then the flux is doubled and the background is eliminated. The limit becomes 2.44/2=1.22, 
worse than before. This problem is absent in the versions proposed by Roe and Woodroofe and also 
in that of Punzi [[To[]. These methods are however restricted to the Poisson case. 

To avoid the unacceptable situation, I have proposed a modified frequentist approach to the cal- 
culation of the Poissonian limits including the information of the limited number of background events 
[|n|]. There the confidence level is normalized to the probability to observe < < n background 
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Table 2: Average of an infinite number of equivalent lifetime measurements using different weighting procedures 
events as known from the measurement. 

The resulting limits respect the likelihood principle (see below) and thus are consistent. They coincide 
with those of the uniform Bayesian method and provide a frequentist interpretation of the Bayesian limits. 
However, as has been pointed out by Highland [|T^, the limits do not have minimum overcoverage as 
required by the strict application of the Neyman construction. This is correct [ p^ but in my paper no 
claim relative coverage had been made. The method has been applied to the a Higgs search [|T4|]. 

Often the background expectation is not known precisely since it is estimated from side bands 
or from other measurements with limited statistics. So far, there is no classical recipe which allows to 
incorporate an uncertainty of the background estimate. 

Likelihood limits also give a sensible description of the data. Whether likelihood limits or Bayesian 
limits obtained from the integration are more sensible depends on the shape of the likelihood function. 
Ideally both limits should be given. 

Fig. 11 compares the coverage of the unified classical and the Bayesian limits. At small signals 
both overcover strongly. For large signals the Bayesian method slightly undercovers and oscillates around 
the nominal value. 

2.12 Example 9: Combining lifetime measurements 

Two events are observed from an exponential decay with true mean life tq = 1/70- The maximum 
likelihood estimate is used either for r or 7. We assume that an infinite number of identical experiments 
is performed and that the results are combined. In Table 2.2 we summarize the results of different 
averaging procedures. There is no prescription for averaging classical intervals. The unified methods 
have to explain how they intend to combine their measurements. To compute the classical result given in 
the table, the maximum likelihood estimate and central intervals were used. 

In this special example a consistent result is obtained in the Bayesian method with uniform prior 
for the decay constant. It shows also how critical the choice of the parameter is in the Bayesian approach. 
It is also clear that an educated choice is also important for the pragmatic procedures. It is obvious that 
the decay constant is the better parameter (see also Fig. 12). Methods approximating the likelihood 
function provide reasonable results unless the likelihood function is very asymmetric. The weighting 
procedure of the PDG applied to the likelihood errors gives reasonable results. As is well known, adding 
the log-likelihood functions always produces a correct result. 

3. Conclusions 

3.1 Conventional classical method 

The conventional classical schemes suffer from the following problems: 

• There are inconsistencies ( Poisson limits, stopping rule, discrete vs. continuous parameters). 

• There is a lack of precision (unphysical limits). 




Fig. 11: Coverage in the unified classical and the Bayesian approach (dotted) and interval lengths (bottom). 
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Fig. 12: Log-likelihood function of the mean life and the decay rate. 

• They have a restricted range of application (problems with digital measurements, discrete param- 
eters). 

• They are not invariant against sample variable transformations (except central intervals in one 
dimension). 

• They are subjective (coverage requires pre-experimental fixing of cuts and decision to publish). 

• There are unsolved problems. (It is not clear how to combine measurements. The inclusion of 
background errors in Poisson processes is not possible.) 

• There is no obvious treatment of nuisance parameters. 

• Systematic errors cannot be included. 

3.2 Unified approach 

Compared to the conventional method there are improvements: 

• The inconsistencies in Poisson processes are weaker ( and absent in the version of Roe and 
Woodroofe) 

• Non-physical limits are avoided. 

• It is invariant with respect to variable and parameter transformations. 

However most problems remain (inconsistencies, lack of precision, background uncertainty in 
Poisson limits), and: 

• It is restricted to specific pdfs (Gaussian like). 

• It is complicated and requires considerable computing efforts. 

• The combination of measurements is even more unclear. 

• Artificial error correlations are introduced near boundaries. 

• The proposed treatment [ |I6| ] of nuisance parameters (use best estimate may lead to undercoverage. 

3.3 Likelihood limits 

Likelihood limits have attractive properties 

• They are consistent. 

• They provide optimum precision. 

• They are invariant against variable and parameter transformations. 

• They provide a coherent transition to discrete hypothesis (likelihood ratio) 

• Measurements can easily be combined 
There are also restrictions in the application: 

• Digital measurements and uniform distributions cannot be handled. 



3.4 Bayesian limits 

The Bayesian philosophy is very general and flexible: 

• All problems can be treated. (Nuisance parameters, digital measurements, unphysical boundaries 
etc.) 

but: 

• They depend on the parameter choice. 
4. Proposed conventions 

The conventions proposed here represent by no means the only reasonable prescription. 

Since the complete information is contained in the likelihood function, classical approaches are 
not considered. (They cannot be computed from the likelihood function alone.) An even stronger reason 
for there exclusion are the obvious inconsistencies of this method. 

The main objection against Bayesian methods is their dependence on the selected parameter. I 
find it rather natural to choose a sensible parameter space. For some application like pattern recognition 
- which, by the way, cannot be done with classical statistics - it is absolutely necessary. 

The proposed conventions are: 

1. Whenever possible the full likelihood function should be published. It contains the experimental 
information and permits to combine the results of different experiments in an optimum way. This 
is especially important when the likelihood is strongly non-Gaussian (strongly asymmetric, cut by 
external bounds, has several maxima etc.). 

2. Data are combined by adding the log-likelihoods. When not known, parametrizations are used to 
approximate it. 

3. If the likelihood is smooth and has a single maximum the likelihood limits should be given to define 
the error interval. These limits are invariant under parameter transformation. For the measurement 
of the parameter the value maximizing the likelihood function is chosen. No correction for biased 
likelihood estimators is applied. The errors usually are asymmetric. These limits can also be inter- 
preted as Bayesian one standard deviation errors for the specific choice of the parameter variable 
where the Ukelihood of the parameter has a Gaussian shape. 

4. Nuisance parameters are eliminated by integrating them out using an uniform prior. A correlation 
coefficient should be computed. 

5. For digital measurements the Bayesian mean and r.m.s. should be used. 

6. In cases where the likelihood function is restricted by physical or mathematical bounds and where 
there are no good reasons to reject an uniform prior the measurement and its errors defined as the 
mean and r.m.s. should be computed in the Bayesian way. 

7. Upper and lower limits are computed from the tails of the Bayesian probability distributions. (In 



some cases likelihood limits may be more informative. JlSP) 

8. Non-uniform prior densities should not be used. 

9. It is the scientist's choice whether to present an error interval or an upper limit. 
10. In any case the applied procedure has to be documented. 

These recipes correspond more or less to our every day practice. An exception are Poisson limits 
where for strange reasons the coverage principle - though only approximately realized - has gained 
preference in neutrino experiments. 
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